Goals of Data Visualization

We aim to display the distribution of asthma cases from 2001-2021 using box or violin plots to guide the direction of our analysis to plot weather temperatures in these states with the most density of asthma cases. We intend to draw insight for the following questions regarding our hypothesis:

  • What states have the most amount of asthma cases?
  • Does this state have higher or cooler temperatures compared to the rest of the US?
  • Have temperatures changed overtime?
  • Do regions with higher asthma prevalence overlap with areas of lower average temperature?
  • Does analyzing prevalence data stratified by time and temperature make a difference?

Data Merging

The merging process combines three datasets (asthma data, weather data, and geographic shapefiles) into a single comprehensive dataset (final_df) that can be used for mapping and statistical analysis. This merged dataset is the foundation for the subsequent visualizations and analyses, enabling the exploration of relationships between asthma prevalence, geographic regions, and environmental factors.

To see more on how we merged the datasets for our visualizations, please click “Show”

Downloading state shape file Data:

shape_files = usmap::us_map()

asthma_df = read_csv("data/asthma_data.csv")|>
  mutate(year= year_name)

weather_df = read_csv("data/temp_data.csv")

Merging Asthma, Tempature, and shape file data:

asthma_weather = 
  asthma_df |>
  left_join(weather_df, by = c("state", "year")) 


final_df =
  shape_files |>
  mutate(state = abbr) |>                           
  left_join(asthma_weather, by = "state") |>        
  drop_na()                                         

Mapping Prevalence Data by State:

Recent Adult Asthma Prevalance by state level is shown below:

ggplot= 
  final_df|> 
  filter(year==2021)|> 
  ggplot() +
  geom_sf(aes(fill = prevalence_percent), color = "white") +
  scale_fill_viridis_c(na.value = "grey90") + 
  theme_minimal() +
  labs(
    title = "Adult Asthma Prevalence by State (2021)",
    fill = "Prevalence (%)"
  ) +
  theme(
    panel.grid = element_blank(), 
    axis.text = element_blank(),
    axis.title = element_blank(),
    axis.ticks = element_blank()
  )

ggplot|>
  ggplotly()
<<<<<<< HEAD

Regions in the North East and Mid East areas of the US had higher prevalence of adult asthma cases as shown by the lighter hue of green and yellow. This is surprising as these states are not southern states which experince warmer temperatures and seasons. We will further explore these trends next.

=======
>>>>>>> 218fed0c4ceed882b111c2b9c035dfdfd11a36f3

Asthma Trend Across the Us over time:

aggregated_data=
  final_df|>
  group_by(year)|>
  summarise(avg_prevalence = mean(prevalence_percent, na.rm = TRUE))

aggregated_data |>
  ggplot(aes(x = year, y = avg_prevalence)) +
  geom_line() +  # Line for the time series
  geom_point(color = "red") +  # Scatter points
  labs(
    title = "Adult Asthma Trend Across the US Over Time",
    x = "Year",
    y = "Average Asthma Prevalence (%)"
  ) +
  theme_minimal()

The first figure shows that asthma prevalence in adults varies across the US. Moreover, asthma prevalence has been steadily increasing overtime. Can this increase and differences we see be associated to rising temperatures and temperature differences in different regions of the country?

Distribution of Asthma Across States:

The box plot method provided a clearer visual representation compared to the violin density plot, effectively illustrating the distribution of data and highlighting the concentration of values within each state.

ggplotly( 
  final_df|>
  group_by(full)|>
  ggplot(aes(x= reorder(full,prevalence_percent), y= prevalence_percent, fill = state))+ 
  geom_boxplot()+
  labs(title= "Distribution of Adult Asthma Across States" )+
  xlab("State")+
  ylab("Asthma Prevalance (%)")+
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5,),
    legend.position = "none"
  ))
<<<<<<< HEAD
=======
>>>>>>> 218fed0c4ceed882b111c2b9c035dfdfd11a36f3

This plot shows the distribution of asthma across states across all years 2011-2021. States located further right on the plot are shown to have higher asthma prevalence by percent compared to other states. States located in the upper North East such as Maine, Rhode Island, Vermont, and New Hampshire had consistently higher asthma prevalence from 2011-2021 compared to other states. States with some of the lowest asthma prevalence (%) were Texas, South Dakota, Florida, Nebraska, and Minnesota.

<<<<<<< HEAD

Distribution of Temperatures (2011-2021)

=======

Distribution of Temperatures Across States:

>>>>>>> 218fed0c4ceed882b111c2b9c035dfdfd11a36f3
ggplotly( 
  final_df|> 
  group_by(full)|>
  ggplot(aes(x= reorder(full, avg_temp), y= avg_temp, fill = state))+ 
  geom_boxplot()+
  labs(title= "Distribution of Temperature Across States" )+
  xlab("State")+
  ylab("Temperature (C)")+
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
    legend.position = "none"
  ))
<<<<<<< HEAD
=======
>>>>>>> 218fed0c4ceed882b111c2b9c035dfdfd11a36f3

When comparing states with higher asthma prevalence to their temperature distributions, we observe that states such as Maine, Rhode Island, Vermont, and New Hampshire exhibit lower median average temperatures and larger interquartile ranges. This variation reflects the significant seasonal temperature fluctuations in these regions. Notably, when comparing the upper and lower 25% of temperature distributions in these states, the temperatures in the upper quartile are more variable. This variability is important to highlight, as previous studies have suggested that extreme temperatures may increase the risk of asthma exacerbation. Overall, varying temperature conditions (both high and low extremes) might influence asthma prevalence across states.

Does the Effect of Prevalence Differ by State?

The analysis reveals state-by-state trends in asthma prevalence over time. For states with significant positive slopes, public health initiatives should focus on identifying the underlying causes (such as environmental factors, healthcare access, or socio-economic disparities) driving the increase in asthma cases. Conversely, states with non-significant or negative trends may reflect the effectiveness of existing public health policies or different underlying factors contributing to asthma prevalence.

# Compute state trends
state_trends <- final_df |>
  group_by(state) |>
  summarise(model = list(lm(prevalence_percent ~ year))) |>
  mutate(model_summary = map(model, broom::tidy)) |>
  unnest(model_summary) |>
  filter(term == "year") |>
  select(state, slope = estimate, p_value = p.value) |>
  st_drop_geometry()

# Render the interactive table with 4 decimal places
datatable(
  state_trends,
  options = list(
    pageLength = 10, # Default number of rows displayed
    lengthMenu = c(5, 10, 25, 50, 100), # Options for rows per page
    scrollX = TRUE # Enable horizontal scrolling if needed
  ),
  rownames = FALSE # Remove row names
) %>%
  formatRound(columns = c("slope", "p_value"), digits = 4)

In summary, while some states show increasing asthma prevalence over time, a deeper investigation into local factors is needed to understand these trends. The interactive table provides a clear and concise way to explore these trends across states, helping to prioritize actions where they are most needed.

Does Correlation differ by state?

cor_results <- final_df |>
  group_by(state) |>
  summarise(
    cor_test = list(cor.test(avg_temp, prevalence_percent)),
    .groups = "drop"
  ) |>
  rowwise() |>
  mutate(
    corr = cor_test$estimate,  # Extract correlation coefficient
    p_value = cor_test$p.value,  # Extract p-value
    geom_summary = st_as_text(st_centroid(geom))  # Convert geometry to text (centroid)
  ) |>
  ungroup() |>
  select(state, corr, p_value, geom_summary)  # Exclude the `geom` column

# Render the interactive table
datatable(
  cor_results,
  options = list(
    pageLength = 10,
    lengthMenu = c(5, 10, 25, 50, 100),
    scrollX = TRUE
  ),
  rownames = FALSE
) %>%
  formatRound(columns = c("corr", "p_value"), digits = 4)

Correlation of prevalence and average temperature is small across all states- could this mean we need to stratify by different factors (such as income level or race)?

Have Temperatures Changed Overtime?

line_plotly = 
  plot_ly(data = temp_yearly_df, 
          x = ~year_name, 
          y = ~avg_temp_yearly, 
          color = ~state, 
          type = 'scatter', 
          mode = 'lines' ) %>% 
  layout(title = "Seasonal Averages by State Over Time", 
          xaxis = list(title = "Year"), 
          yaxis = list(title = "Yearly Average"))

heat_plotly =
  plot_ly(data = temp_yearly_df, 
          x = ~year_name, 
          y = ~state, 
          z = ~avg_temp_yearly, 
          type = "heatmap", 
          colorscale = "Viridis" ) %>% 
  layout(title = "Heatmap of Yearly Averages by State and Year", 
         xaxis = list(title = "Year"),
         yaxis = list(title = "State"), 
         colorbar = list(title = "Yearly Avg"))
line_plotly
<<<<<<< HEAD
heat_plotly
=======
heat_plotly
>>>>>>> 218fed0c4ceed882b111c2b9c035dfdfd11a36f3

As seen from the graphs above, average temperatures varied across states at each year, as well as varied within each state over the years. How these variations affect asthma prevalence are explored on the maps and regression pages.

<<<<<<< HEAD

Initial Statistical Analysis

# bar plot of temperature and line plot of prevalence by each state
merged_df |> 
  mutate(state = reorder(state, avg_temp_yearly)) |> 
  group_by(state) |> 
  summarize(prevalence = mean(prevalence_percent),
            temp = mean(avg_temp_yearly)) |> 
  ggplot(aes(x = state)) +
    geom_bar(aes(y = temp), stat = "identity", fill = "skyblue") +
    geom_line(aes(y = prevalence, group = 1)) +
    geom_point(aes(y = prevalence), color = "red") +
    scale_y_continuous(
      name = "temperature",
      sec.axis = sec_axis(~., name = "Prevalence (%)")
    ) +
  labs(
    title = "Temperature and Prevalence by State",
    color = "legend"
  ) + 
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

Does Correlation Between Asthma and Temperature differ by state?

cor_results= 
  final_df |>
  group_by(state) |>
  summarise(
    cor_test = list(cor.test(avg_temp, prevalence_percent)),
    .groups = "drop")|>
  rowwise()|>
  mutate(
    corr=cor_test[["estimate"]],
    p_value= cor_test[["p.value"]])|>
  ungroup()|>
  select(-cor_test)|>
   st_drop_geometry()|>
  knitr::kable(digits=5,  align = "c")

cor_results
state corr p_value
AK 0.05783 0.70924
AL 0.01064 0.94533
AR 0.01423 0.92693
AZ 0.06132 0.69254
CA -0.00830 0.95734
CO -0.02281 0.88315
CT 0.03076 0.84290
DE 0.02194 0.88759
FL -0.07174 0.65999
GA -0.00238 0.98777
HI -0.11770 0.44673
IA 0.02337 0.88032
ID -0.00193 0.99009
IL 0.00651 0.96656
IN -0.07432 0.63162
KS -0.02003 0.89729
KY 0.02177 0.88845
LA -0.00176 0.99095
MA -0.01240 0.93634
MD 0.03159 0.83867
ME 0.00038 0.99807
MI -0.03001 0.84668
MN -0.01584 0.91874
MO -0.01098 0.94361
MS 0.04062 0.79347
MT -0.06500 0.67509
NC 0.01004 0.94843
ND 0.02529 0.87057
NE -0.00178 0.99084
NH 0.03758 0.80865
NJ 0.02442 0.88108
NM 0.05329 0.75411
NV -0.00966 0.95036
NY -0.04588 0.76745
OH -0.01605 0.91765
OK 0.01914 0.90185
OR 0.00184 0.99056
PA 0.02032 0.89585
RI 0.02585 0.86772
SC 0.02651 0.86438
SD -0.02918 0.85084
TN 0.03026 0.84540
TX 0.00805 0.95866
UT -0.05871 0.70503
VA 0.00848 0.95642
VT -0.02405 0.87683
WA -0.03590 0.81706
WI -0.05946 0.70143
WV 0.05704 0.71304
WY 0.01313 0.93258

Correlation of prevalence and average temperature is small across all states- could this mean we need to stratify by different factors (such as income level or race)?

Does the Effect of Temperature on Asthma Prevalence Differ by State:

state_trends= 
  final_df |>
  group_by(state) |>
  summarise(
    model = list(lm(prevalence_percent ~ year)))


state_trends=   state_trends |>
  mutate(
    model_summary = map(model, broom::tidy)) |>
  unnest(model_summary)|>
  filter(term == "year")|>
  select(
    state, 
    slope = estimate, 
    intercept = NULL, 
    p_value = p.value)|>
  st_drop_geometry()|>
  knitr::kable(digits=5)

state_trends
state slope p_value
AK 0.06182 0.00868
AL 0.16455 0.00001
AR 0.02909 0.25949
AZ 0.06091 0.00049
CA 0.02636 0.29868
CO 0.16364 0.00000
CT 0.09364 0.00000
DE 0.04727 0.16551
FL -0.05152 0.09681
GA 0.00455 0.86590
HI -0.06182 0.04633
IA 0.08273 0.00179
ID 0.10000 0.00000
IL 0.03455 0.07010
IN 0.02182 0.28358
KS 0.18364 0.00000
KY 0.05545 0.17836
LA 0.19909 0.00000
MA -0.00909 0.76559
MD 0.04636 0.00314
ME 0.01455 0.59673
MI 0.09091 0.00002
MN 0.11364 0.00000
MO -0.05727 0.01216
MS 0.22636 0.00000
MT 0.11455 0.00003
NC 0.03455 0.19492
ND 0.03182 0.09917
NE 0.10545 0.00000
NH 0.14909 0.00023
NJ -0.00400 0.87883
NM 0.04292 0.17131
NV 0.19000 0.00000
NY -0.01636 0.41568
OH 0.01727 0.47857
OK 0.12545 0.00000
OR 0.05091 0.01235
PA 0.10091 0.00000
RI 0.09364 0.00184
SC 0.12182 0.00000
SD 0.09909 0.00123
TN 0.30727 0.00000
TX 0.07182 0.00022
UT 0.14455 0.00000
VA 0.05636 0.00440
VT 0.05636 0.01882
WA 0.04818 0.00625
WI 0.10182 0.00312
WV 0.32091 0.00000
WY 0.09182 0.00036
======= >>>>>>> 218fed0c4ceed882b111c2b9c035dfdfd11a36f3